University of Texas at San Antonio



**Open Cloud Institute**


Machine Learning/BigData EE-6973-001-Fall-2016


**Paul Rad, Ph.D.**

**Ali Miraftab, Research Fellow**



**Anomalous Behaviour Detection using Machine Learning**


Gonzalo De La Torre, Vivek Sarkale
*Open Cloud Institute, University of Texas at San Antonio, San Antonio, Texas, USA*
gonzalo.delatorreparra@utsa.edu, prn180@utsa.edu



**Project Definition:** The purpose of this project is to create a Deep Learning model capable to distinguish anomalous network traffic from normal network traffic and gives the early warning of unusual behaviour by learning characteristics from normal data sets and flagging unseen pattern.Project development has two major steps

1. Data sets with anomalous and normal behaviour

  1. Develop a deep learning model to make decision </span>

We have gathered data sets which includes data packets with anomalous as well as normal behavior ,53.5 % data is abnormal and 46.5 % data is normal data. Out of abnormal data sets 42.8 % data we are going to use for training purpose and 10.7 % for the testing purpose . We have studied the parameters which makes drastic effect in decision making process . As per our datasets we consider as major parameters the ones listed below. The percentage listed next to each one of these is a percentage of effectiveness provided by the previous study performed at Aberystwyth in the UK.

Parameters                     Weight ( Effect in % ) 
i) Index                       29 %
ii) URL                        12.6 %
iii) Payload                   13.8 %
iv) Cookies                    9.96 %
v) Content length              10.2 % 

Based on this dataset, we will proceed to develop our algorithm.

</span>

**Outcome:**

The expected outcomes include:

  1. Automated detection of abnormal traffic

  2. Capability to implement the algorithm in a cloud testbed </span>

**Dataset:** The dataset is based on the CSIC 2010 HTTP dataset http://isi.csic.es/dataset/. CSIC created the original dataset of HTTP/1.1 packets, including web application penetration testing packets and privided it with 2 labels (normal and anomalous). The dataset to be used is the same one used by the Aberystwyth University in the UK and can be found here:

http://users.aber.ac.uk/pds7/csic_dataset/csic2010http.html </span>